R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.0
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.4 compiler_4.3.2 fastmap_1.1.1 cli_3.6.2
[5] tools_4.3.2 htmltools_0.5.7 rstudioapi_0.15.0 yaml_2.3.8
[9] rmarkdown_2.25 knitr_1.45 jsonlite_1.8.8 xfun_0.41
[13] digest_0.6.33 rlang_1.1.2 evaluate_0.23
1 Filling gaps in lecture notes (10pts)
Consider the regression model \[
Y = f(X) + \epsilon,
\] where \(\operatorname{E}(\epsilon) = 0\).
1.1 Optimal regression function
Show that the choice \[
f_{\text{opt}}(X) = \operatorname{E}(Y | X)
\] minimizes the mean squared prediction error \[
\operatorname{E}\{[Y - f(X)]^2\},
\] where the expectations averages over variations in both \(X\) and \(Y\). (Hint: condition on \(X\).)
\[
\operatorname{E}\{[Y - f(X)]^2\}= \operatorname{E}\{[Y - f_{opt}(X)]^2\} + \operatorname{E}\{[f_{opt}(X) -F(X)]^2\}
\] The overall expression is minimized when the first term is minimized. This happens when \(f_{opt}(X) = \operatorname{E}(Y | X)\). Therefore, \(f_{opt}(X) = \operatorname{E}(Y | X)\) minimizes the mean squared prediction error.
1.2 Bias-variance trade-off
Given an estimate \(\hat f\) of \(f\), show that the test error at a \(x_0\) can be decomposed as \[
\operatorname{E}\{[y_0 - \hat f(x_0)]^2\} = \underbrace{\operatorname{Var}(\hat f(x_0)) + [\operatorname{Bias}(\hat f(x_0))]^2}_{\text{MSE of } \hat f(x_0) \text{ for estimating } f(x_0)} + \underbrace{\operatorname{Var}(\epsilon)}_{\text{irreducible}},
\] where the expectation averages over the variability in \(y_0\) and \(\hat f\).
Because we assume that \(\hat f(x_0)\) and \(\epsilon\) are independent, so we have \(\operatorname{E}\{f(x_0) - \hat f(x_0)]\epsilon\} = 0\). So we have
Squared Bias: The discrepancy between the model’s approximation and the true underlying function. As model flexibility increases, a more flexible model becomes increasingly similar to the true function, leading to a diminishing squared bias.
Variance: In the case of a model with minimal flexibility, the variance is zero, as the model fit remains independent of the data. However, as flexibility increases, the variance also increases, capturing the noise in a particular training set. The variance curve is a monotonically increasing function as model flexibility grows.
Training Error:The training error is determined by the average (squared) difference between model predictions and observations. For very inflexible models, this difference can be substantial, but with increasing flexibility (e.g., by fitting higher-degree polynomials), the additional degrees of freedom reduce the average difference, resulting in a decrease in training error.
Bayes Error: This term remains constant since, by definition, it does not depend on X and, consequently, is unaffected by the flexibility of the model.
Test Error: The expected test error is defined as Variance + Bias + Bayes error. The test error exhibits a minimum at an intermediate level of flexibility—neither too flexible, where variance dominates, nor too inflexible, where squared bias is high. The test error plot resembles a somewhat deformed upward parabola: initially high for inflexible models, decreasing as flexibility increases to a minimum, and then increasing as variance starts to dominate. The distance between this minimum and the Bayes irreducible error provides insight into how well the best function in the hypothesis space will fit.
3 ISL Exercise 2.4.4 (10pts)
Classification Applications:
Medical diagnosis. Response: disease present or absent. Predictors: symptoms, test results, patient history, etc. Goal:Inference aiding in diagnosis and treatment planning.
Spam detection. Response: spam or not spam. Predictors: email contents, email sender, etc. Goal: Prediction of spam.
Face recognition. Response: identity of face. Predictors: picture of face, lighting, angle, etc. Goal: Prediction of identity.
Regression Applications:
Cox proportional hazards model. Response: the time until an event occurs (survival time).Predictors: Covariates or features that may influence the hazard rate over time. Goal: Prediction of survival time.
Stock market prediction. Response: price of stock. Predictors: company performance, economic indicators, etc. Goal: Prediction of stock price.
Educational assessment. Response: student’s grade. Predictors: student’s performance on homework, quizzes, etc. Goal: Prediction of student’s grade.
Cluster Analysis Applications:
Market segmentation. Response: market segment. Predictors: customer characteristics, purchasing history, etc. Goal: Identification of distinct groups of customers.
Social network analysis. Response: community. Predictors: social network connections, interests, etc. Goal: Identification of distinct groups of people.
Image segmentation. Response: object. Predictors: pixel color, pixel location, etc. Goal: Identification of distinct objects in an image.
library(ISLR2)cat("Number of rows:", nrow(Boston), "\n")
Number of rows: 506
cat("Number of columns:", ncol(Boston), "\n")
Number of columns: 13
Rows: Each row corresponds to a single observation or data point. In this case, each row represents information about a specific suburb in Boston.
Columns: Each column represents a different variable or feature associated with the observations. In this case, each column provides information about a specific aspect of the housing values in these suburbs like ‘crim’(per capita crime rate) by town,‘zn’ (proportion of residential land zoned for lots over 25,000 sq.ft.), etc.
crim: per capita crime rate by town.
zn: proportion of residential land zoned for lots over 25,000 sq.ft.
indus: proportion of non-retail business acres per town.
chas: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
nox: nitrogen oxides concentration (parts per 10 million).
rm: average number of rooms per dwelling.
age: proportion of owner-occupied units built prior to 1940.
dis: weighted mean of distances to five Boston employment centres.
rad: index of accessibility to radial highways.
tax: full-value property-tax rate per $10,000.
ptratio: pupil-teacher ratio by town.
lstat: lower status of the population (percent).
medv: median value of owner-occupied homes in $1000s.
The correlation coefficient between nox and indus is 0.764, statistically significant at the 0.001 level.
This positive correlation suggests a strong linear relationship, indicating that as the concentration of nitrogen oxides increases, the proportion of non-retail business acres also tends to increase.
It does not imply causation but this positive correlation may be attributed to factors such as concentration of industrial activities, urban planning, land use, and environmental policies which may need further analysis.
The correlation coefficient between medv and lstat is -0.738, statistically significant at the 0.001 level.
This negative correlation suggests a strong linear relationship, indicating that as the median value of homes decreases, the lower status of the population tends to increase.
In other words, areas with higher proportions of lower-status populations tend to have lower median home values.
The correlation coefficient between tax and indus is 0.721, statistically significant at the 0.001 level.
This positive correlation suggests that towns with a higher proportion of non-retail business acres tend to have higher property-tax rates.
4.3 c
Negative Relationships:
zn: As proportion of residential land zoned for lots over 25,000 sq.ft. increases, per capita crime rate tends to decrease.
rm: An increase in the average number of rooms per dwelling is associated with a decrease in per capita crime rate.
dis: Per capita crime rate decreases as the weighted mean distance to employment centres increases.
medv: Higher median home values are associated with lower per capita crime rates.
Positive Relationships:
indus: An increase in non-retail business acreage is associated with an increase in per capita crime rate.
nox: Higher nitrogen oxides concentration is associated with higher per capita crime rates.
age: Areas with a higher proportion of older buildings tend to have higher per capita crime rates.
rad: Higher accessibility to radial highways is associated with higher per capita crime rates.
tax: Areas with higher property tax rates tend to have higher per capita crime rates.
lstat: An increase in the lower status of the population is associated with higher per capita crime rates.
Majority of towns have very low crime rates, possibly between zero to five.
Some areas exhibit very high crime rates, exceeding 70. Outliers range from 10 to above 80, and many outlier towns do not have extremely high crime rates.
Overall, the data ranges from 0 to above 80.
Full-value property-tax rate per $10,000:
No outliers are observed in property tax rates.
The median value near 300 suggests skewed data, ranging from 187 to 711.
Pupil-teacher ratio by town:
Outliers are present in the lower extreme of the box plot.
The data ranges from 12.6 to 22.The median value for pupil-teacher ratio is around 19.
4.5 e
table(Boston$chas)
0 1
471 35
The table above shows that 35 suburbs bound the Charles River.
4.6 f
median(Boston$ptratio)
[1] 19.05
The median pupil-teacher ratio among the towns in this data set is 19.05.
crim zn indus chas nox
Min. : 0.00632 Min. : 0.00 Min. : 0.46 0:471 Min. :0.3850
1st Qu.: 0.08205 1st Qu.: 0.00 1st Qu.: 5.19 1: 35 1st Qu.:0.4490
Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.5380
Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.5547
3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.6240
Max. :88.97620 Max. :100.00 Max. :27.74 Max. :0.8710
rm age dis rad
Min. :3.561 Min. : 2.90 Min. : 1.130 Min. : 1.000
1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100 1st Qu.: 4.000
Median :6.208 Median : 77.50 Median : 3.207 Median : 5.000
Mean :6.285 Mean : 68.57 Mean : 3.795 Mean : 9.549
3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188 3rd Qu.:24.000
Max. :8.780 Max. :100.00 Max. :12.127 Max. :24.000
tax ptratio lstat medv
Min. :187.0 Min. :12.60 Min. : 1.73 Min. : 5.00
1st Qu.:279.0 1st Qu.:17.40 1st Qu.: 6.95 1st Qu.:17.02
Median :330.0 Median :19.05 Median :11.36 Median :21.20
Mean :408.2 Mean :18.46 Mean :12.65 Mean :22.53
3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:16.95 3rd Qu.:25.00
Max. :711.0 Max. :22.00 Max. :37.97 Max. :50.00
summary(Boston_gt_8rooms)
crim zn indus chas nox
Min. :0.02009 Min. : 0.00 Min. : 2.680 0:11 Min. :0.4161
1st Qu.:0.33147 1st Qu.: 0.00 1st Qu.: 3.970 1: 2 1st Qu.:0.5040
Median :0.52014 Median : 0.00 Median : 6.200 Median :0.5070
Mean :0.71879 Mean :13.62 Mean : 7.078 Mean :0.5392
3rd Qu.:0.57834 3rd Qu.:20.00 3rd Qu.: 6.200 3rd Qu.:0.6050
Max. :3.47428 Max. :95.00 Max. :19.580 Max. :0.7180
rm age dis rad
Min. :8.034 Min. : 8.40 Min. :1.801 Min. : 2.000
1st Qu.:8.247 1st Qu.:70.40 1st Qu.:2.288 1st Qu.: 5.000
Median :8.297 Median :78.30 Median :2.894 Median : 7.000
Mean :8.349 Mean :71.54 Mean :3.430 Mean : 7.462
3rd Qu.:8.398 3rd Qu.:86.50 3rd Qu.:3.652 3rd Qu.: 8.000
Max. :8.780 Max. :93.90 Max. :8.907 Max. :24.000
tax ptratio lstat medv
Min. :224.0 Min. :13.00 Min. :2.47 Min. :21.9
1st Qu.:264.0 1st Qu.:14.70 1st Qu.:3.32 1st Qu.:41.7
Median :307.0 Median :17.40 Median :4.14 Median :48.3
Mean :325.1 Mean :16.36 Mean :4.31 Mean :44.2
3rd Qu.:307.0 3rd Qu.:17.40 3rd Qu.:5.12 3rd Qu.:50.0
Max. :666.0 Max. :20.20 Max. :7.44 Max. :50.0
These findings suggest that census tracts with more than eight rooms per dwelling generally have favorable indicators such as low crime rates, high residential land proportions, low industrial presence, proximity to the Charles River, low nitrogen oxides concentration, spacious dwellings, newer units, moderate accessibility, moderate tax rates, low pupil-teacher ratios, low lower status percentages, and higher median home values.
5 ISL Exercise 3.7.3 (12pts)
5.1 a
Only ii is correct.
\(\hat{\beta_3}\)=35 means that college graduates have a starting salary that is $35,000 higher than high school graduates on average.
There are no interaction terms involving Gender, so the effect of Gender does not depend on the values of GPA or IQ.
The effect of 35 is unconditional, for any fixed values of the other predictors GPA and IQ.
5.2 b
\[
Salary=\hat{\beta_0}+\hat{\beta_1}*GPA+\hat{\beta_2}*IQ+\hat{\beta_3}*Level+\hat{\beta_4}*(GPQ*IQ)+\hat{\beta_5}*(GPQ*Level)
\] Substitute the given values: \[
Salary=50+20*4.0+0.07*110+35+0.01*(4.0*110)-10*(4.0*1)=137.1
\]
5.3 c
False. \(\hat{\beta_4}\) is small and it means a small interaction effect. However, to assess statistical significance, we would typically look at the p-value associated with \(\hat{\beta_4}\), rather than its magnitude.If the p-value is small (usually below a significance level like 0.05), it provides evidence against the null hypothesis that the interaction effect is zero.
Call:
lm(formula = crim ~ zn, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-4.429 -4.222 -2.620 1.250 84.523
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.45369 0.41722 10.675 < 2e-16 ***
zn -0.07393 0.01609 -4.594 5.51e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.435 on 504 degrees of freedom
Multiple R-squared: 0.04019, Adjusted R-squared: 0.03828
F-statistic: 21.1 on 1 and 504 DF, p-value: 5.506e-06
summary(lm.indus)
Call:
lm(formula = crim ~ indus, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-11.972 -2.698 -0.736 0.712 81.813
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.06374 0.66723 -3.093 0.00209 **
indus 0.50978 0.05102 9.991 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.866 on 504 degrees of freedom
Multiple R-squared: 0.1653, Adjusted R-squared: 0.1637
F-statistic: 99.82 on 1 and 504 DF, p-value: < 2.2e-16
summary(lm.chas)
Call:
lm(formula = crim ~ chas, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-3.738 -3.661 -3.435 0.018 85.232
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.7444 0.3961 9.453 <2e-16 ***
chas1 -1.8928 1.5061 -1.257 0.209
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.597 on 504 degrees of freedom
Multiple R-squared: 0.003124, Adjusted R-squared: 0.001146
F-statistic: 1.579 on 1 and 504 DF, p-value: 0.2094
summary(lm.nox)
Call:
lm(formula = crim ~ nox, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-12.371 -2.738 -0.974 0.559 81.728
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -13.720 1.699 -8.073 5.08e-15 ***
nox 31.249 2.999 10.419 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.81 on 504 degrees of freedom
Multiple R-squared: 0.1772, Adjusted R-squared: 0.1756
F-statistic: 108.6 on 1 and 504 DF, p-value: < 2.2e-16
summary(lm.rm)
Call:
lm(formula = crim ~ rm, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-6.604 -3.952 -2.654 0.989 87.197
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.482 3.365 6.088 2.27e-09 ***
rm -2.684 0.532 -5.045 6.35e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.401 on 504 degrees of freedom
Multiple R-squared: 0.04807, Adjusted R-squared: 0.04618
F-statistic: 25.45 on 1 and 504 DF, p-value: 6.347e-07
summary(lm.age)
Call:
lm(formula = crim ~ age, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-6.789 -4.257 -1.230 1.527 82.849
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.77791 0.94398 -4.002 7.22e-05 ***
age 0.10779 0.01274 8.463 2.85e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.057 on 504 degrees of freedom
Multiple R-squared: 0.1244, Adjusted R-squared: 0.1227
F-statistic: 71.62 on 1 and 504 DF, p-value: 2.855e-16
summary(lm.dis)
Call:
lm(formula = crim ~ dis, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-6.708 -4.134 -1.527 1.516 81.674
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.4993 0.7304 13.006 <2e-16 ***
dis -1.5509 0.1683 -9.213 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.965 on 504 degrees of freedom
Multiple R-squared: 0.1441, Adjusted R-squared: 0.1425
F-statistic: 84.89 on 1 and 504 DF, p-value: < 2.2e-16
summary(lm.rad)
Call:
lm(formula = crim ~ rad, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-10.164 -1.381 -0.141 0.660 76.433
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.28716 0.44348 -5.157 3.61e-07 ***
rad 0.61791 0.03433 17.998 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.718 on 504 degrees of freedom
Multiple R-squared: 0.3913, Adjusted R-squared: 0.39
F-statistic: 323.9 on 1 and 504 DF, p-value: < 2.2e-16
summary(lm.tax)
Call:
lm(formula = crim ~ tax, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-12.513 -2.738 -0.194 1.065 77.696
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.528369 0.815809 -10.45 <2e-16 ***
tax 0.029742 0.001847 16.10 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.997 on 504 degrees of freedom
Multiple R-squared: 0.3396, Adjusted R-squared: 0.3383
F-statistic: 259.2 on 1 and 504 DF, p-value: < 2.2e-16
summary(lm.ptratio)
Call:
lm(formula = crim ~ ptratio, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-7.654 -3.985 -1.912 1.825 83.353
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.6469 3.1473 -5.607 3.40e-08 ***
ptratio 1.1520 0.1694 6.801 2.94e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.24 on 504 degrees of freedom
Multiple R-squared: 0.08407, Adjusted R-squared: 0.08225
F-statistic: 46.26 on 1 and 504 DF, p-value: 2.943e-11
summary(lm.lstat)
Call:
lm(formula = crim ~ lstat, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-13.925 -2.822 -0.664 1.079 82.862
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.33054 0.69376 -4.801 2.09e-06 ***
lstat 0.54880 0.04776 11.491 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.664 on 504 degrees of freedom
Multiple R-squared: 0.2076, Adjusted R-squared: 0.206
F-statistic: 132 on 1 and 504 DF, p-value: < 2.2e-16
summary(lm.medv)
Call:
lm(formula = crim ~ medv, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-9.071 -4.022 -2.343 1.298 80.957
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.79654 0.93419 12.63 <2e-16 ***
medv -0.36316 0.03839 -9.46 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.934 on 504 degrees of freedom
Multiple R-squared: 0.1508, Adjusted R-squared: 0.1491
F-statistic: 89.49 on 1 and 504 DF, p-value: < 2.2e-16
Significant associations were found between the crime rate (crim) and the following variables in the regression models: zn, indus, nox, rm, age, dis, rad, tax, ptratio, lstat, and medv.
However, there was no statistically significant association between crime rate and the chas variable.
6.2 b
Multiple Linear Regression Models:
model_multiple <-lm(crim ~ ., data = Boston)summary(model_multiple)
For the predictors zn, dis, rad, and medv, we can reject the null hypothesis as their p-values are less than 0.05.
6.3 c
univariate_coefficients <-sapply(Boston[, -1], function(x) lm(crim ~ x, data = Boston)$coefficients[2])multiple_coefficients <-coef(lm(crim ~ ., data = Boston))coefficients_df <-data.frame(Univariate = univariate_coefficients, Multiple = multiple_coefficients[-1], Predictor =colnames(Boston)[-1])library(ggplot2)ggplot(coefficients_df, aes(x = Univariate, y = Multiple, label = Predictor)) +geom_point(position =position_jitter(width =0.2, height =0.1), size =3, color ="blue", alpha =0.7) +geom_text(hjust =0, vjust =0, size =4) +labs(title ="Comparison of Univariate and Multiple Regression Coefficients",x ="Univariate Coefficients", y ="Multiple Coefficients")
6.4 d
lm_zn <-lm(crim ~poly(zn, 3), data = Boston)summary(lm_zn) # 1,2 orders are siginificant
Call:
lm(formula = crim ~ poly(zn, 3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-4.821 -4.614 -1.294 0.473 84.130
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6135 0.3722 9.709 < 2e-16 ***
poly(zn, 3)1 -38.7498 8.3722 -4.628 4.7e-06 ***
poly(zn, 3)2 23.9398 8.3722 2.859 0.00442 **
poly(zn, 3)3 -10.0719 8.3722 -1.203 0.22954
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.372 on 502 degrees of freedom
Multiple R-squared: 0.05824, Adjusted R-squared: 0.05261
F-statistic: 10.35 on 3 and 502 DF, p-value: 1.281e-06
lm_indus <-lm(crim ~poly(indus, 3), data = Boston)summary(lm_indus) # 1,2,3 orders are siginificant
Call:
lm(formula = crim ~ poly(indus, 3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-8.278 -2.514 0.054 0.764 79.713
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.614 0.330 10.950 < 2e-16 ***
poly(indus, 3)1 78.591 7.423 10.587 < 2e-16 ***
poly(indus, 3)2 -24.395 7.423 -3.286 0.00109 **
poly(indus, 3)3 -54.130 7.423 -7.292 1.2e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.423 on 502 degrees of freedom
Multiple R-squared: 0.2597, Adjusted R-squared: 0.2552
F-statistic: 58.69 on 3 and 502 DF, p-value: < 2.2e-16
# lm.chas = lm(crim~poly(chas,3)) : qualitative predictorlm_nox <-lm(crim ~poly(nox, 3), data = Boston)summary(lm_nox) # 1,2,3 orders are siginificant
Call:
lm(formula = crim ~ poly(nox, 3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-9.110 -2.068 -0.255 0.739 78.302
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6135 0.3216 11.237 < 2e-16 ***
poly(nox, 3)1 81.3720 7.2336 11.249 < 2e-16 ***
poly(nox, 3)2 -28.8286 7.2336 -3.985 7.74e-05 ***
poly(nox, 3)3 -60.3619 7.2336 -8.345 6.96e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.234 on 502 degrees of freedom
Multiple R-squared: 0.297, Adjusted R-squared: 0.2928
F-statistic: 70.69 on 3 and 502 DF, p-value: < 2.2e-16
lm_rm <-lm(crim ~poly(rm, 3), data = Boston)summary(lm_rm) # 1,2 orders are siginificant
Call:
lm(formula = crim ~ poly(rm, 3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-18.485 -3.468 -2.221 -0.015 87.219
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6135 0.3703 9.758 < 2e-16 ***
poly(rm, 3)1 -42.3794 8.3297 -5.088 5.13e-07 ***
poly(rm, 3)2 26.5768 8.3297 3.191 0.00151 **
poly(rm, 3)3 -5.5103 8.3297 -0.662 0.50858
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.33 on 502 degrees of freedom
Multiple R-squared: 0.06779, Adjusted R-squared: 0.06222
F-statistic: 12.17 on 3 and 502 DF, p-value: 1.067e-07
lm_age <-lm(crim ~poly(age, 3), data = Boston)summary(lm_age) # 1,2,3 orders are siginificant
Call:
lm(formula = crim ~ poly(age, 3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-9.762 -2.673 -0.516 0.019 82.842
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6135 0.3485 10.368 < 2e-16 ***
poly(age, 3)1 68.1820 7.8397 8.697 < 2e-16 ***
poly(age, 3)2 37.4845 7.8397 4.781 2.29e-06 ***
poly(age, 3)3 21.3532 7.8397 2.724 0.00668 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.84 on 502 degrees of freedom
Multiple R-squared: 0.1742, Adjusted R-squared: 0.1693
F-statistic: 35.31 on 3 and 502 DF, p-value: < 2.2e-16
lm_dis <-lm(crim ~poly(dis, 3), data = Boston)summary(lm_dis) # 1,2,3 orders are siginificant
Call:
lm(formula = crim ~ poly(dis, 3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-10.757 -2.588 0.031 1.267 76.378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6135 0.3259 11.087 < 2e-16 ***
poly(dis, 3)1 -73.3886 7.3315 -10.010 < 2e-16 ***
poly(dis, 3)2 56.3730 7.3315 7.689 7.87e-14 ***
poly(dis, 3)3 -42.6219 7.3315 -5.814 1.09e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.331 on 502 degrees of freedom
Multiple R-squared: 0.2778, Adjusted R-squared: 0.2735
F-statistic: 64.37 on 3 and 502 DF, p-value: < 2.2e-16
lm_rad <-lm(crim ~poly(rad, 3), data = Boston)summary(lm_rad) # 1,2 orders are siginificant
Call:
lm(formula = crim ~ poly(rad, 3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-10.381 -0.412 -0.269 0.179 76.217
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6135 0.2971 12.164 < 2e-16 ***
poly(rad, 3)1 120.9074 6.6824 18.093 < 2e-16 ***
poly(rad, 3)2 17.4923 6.6824 2.618 0.00912 **
poly(rad, 3)3 4.6985 6.6824 0.703 0.48231
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.682 on 502 degrees of freedom
Multiple R-squared: 0.4, Adjusted R-squared: 0.3965
F-statistic: 111.6 on 3 and 502 DF, p-value: < 2.2e-16
lm_tax <-lm(crim ~poly(tax, 3), data = Boston)summary(lm_tax) # 1,2 orders are siginificant
Call:
lm(formula = crim ~ poly(tax, 3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-13.273 -1.389 0.046 0.536 76.950
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6135 0.3047 11.860 < 2e-16 ***
poly(tax, 3)1 112.6458 6.8537 16.436 < 2e-16 ***
poly(tax, 3)2 32.0873 6.8537 4.682 3.67e-06 ***
poly(tax, 3)3 -7.9968 6.8537 -1.167 0.244
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.854 on 502 degrees of freedom
Multiple R-squared: 0.3689, Adjusted R-squared: 0.3651
F-statistic: 97.8 on 3 and 502 DF, p-value: < 2.2e-16
lm_ptratio <-lm(crim ~poly(ptratio, 3), data = Boston)summary(lm_ptratio) # 1,2,3 orders are siginificant
Call:
lm(formula = crim ~ poly(ptratio, 3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-6.833 -4.146 -1.655 1.408 82.697
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.614 0.361 10.008 < 2e-16 ***
poly(ptratio, 3)1 56.045 8.122 6.901 1.57e-11 ***
poly(ptratio, 3)2 24.775 8.122 3.050 0.00241 **
poly(ptratio, 3)3 -22.280 8.122 -2.743 0.00630 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.122 on 502 degrees of freedom
Multiple R-squared: 0.1138, Adjusted R-squared: 0.1085
F-statistic: 21.48 on 3 and 502 DF, p-value: 4.171e-13
lm_lstat <-lm(crim ~poly(lstat, 3), data = Boston)summary(lm_lstat) # 1,2 orders are siginificant
Call:
lm(formula = crim ~ poly(lstat, 3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.234 -2.151 -0.486 0.066 83.353
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6135 0.3392 10.654 <2e-16 ***
poly(lstat, 3)1 88.0697 7.6294 11.543 <2e-16 ***
poly(lstat, 3)2 15.8882 7.6294 2.082 0.0378 *
poly(lstat, 3)3 -11.5740 7.6294 -1.517 0.1299
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.629 on 502 degrees of freedom
Multiple R-squared: 0.2179, Adjusted R-squared: 0.2133
F-statistic: 46.63 on 3 and 502 DF, p-value: < 2.2e-16
lm_medv <-lm(crim ~poly(medv, 3), data = Boston)summary(lm_medv) # 1,2,3 orders are siginificant
Call:
lm(formula = crim ~ poly(medv, 3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-24.427 -1.976 -0.437 0.439 73.655
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.614 0.292 12.374 < 2e-16 ***
poly(medv, 3)1 -75.058 6.569 -11.426 < 2e-16 ***
poly(medv, 3)2 88.086 6.569 13.409 < 2e-16 ***
poly(medv, 3)3 -48.033 6.569 -7.312 1.05e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.569 on 502 degrees of freedom
Multiple R-squared: 0.4202, Adjusted R-squared: 0.4167
F-statistic: 121.3 on 3 and 502 DF, p-value: < 2.2e-16
Answer: Yes for most, except for chas. See above inline comments.
7 Bonus question (20pts)
For multiple linear regression, show that \(R^2\) is equal to the correlation between the response vector \(\mathbf{y} = (y_1, \ldots, y_n)^T\) and the fitted values \(\hat{\mathbf{y}} = (\hat y_1, \ldots, \hat y_n)^T\). That is \[
R^2 = 1 - \frac{\text{RSS}}{\text{TSS}} = [\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}})]^2.
\]
Answer: Recall that the coefficient of determination is defined as: \[R^2 = 1 - \frac{\text{RSS}}{\text{TSS}}\] Where RSS is the residual sum of squares and TSS is the total sum of squares.
The total sum of squares is defined as: \[\text{TSS} = \sum_{i=1}^n (y_i - \bar{y})^2\] Where \(\bar{y} = \frac{1}{n}\sum_{i=1}^n y_i\) is the mean of the response values.
The residual sum of squares is defined as: \[\text{RSS} = \sum_{i=1}^n (y_i - \hat{y}_i)^2\] Where \(\hat{y}_i\) is the fitted values from the regression.
Now the correlation between the response vector \(\mathbf{y}\) and fitted values \(\hat{\mathbf{y}}\) is defined as: \[\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}}) = \frac{\sum_{i=1}^n (y_i - \bar{y})(\hat{y}_i - \bar{\hat{y}})}{\sqrt{\sum_{i=1}^n(y_i - \bar{y})^2\sum_{i=1}^n(\hat{y}_i - \bar{\hat{y}})^2}}\]
Note that \(\bar{\hat{y}} = \frac{1}{n}\sum_{i=1}^n \hat{y}_i = \bar{y}\) since the fitted values \(\hat{y}_i\) have the same mean as the response values \(y_i\).
Therefore, the correlation simplifies to: \[\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}}) = \frac{\sum_{i=1}^n (y_i - \bar{y})(\hat{y}_i - \bar{y})}{\sqrt{TSS \cdot RSS}}\]